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, Abstract 

, PAE cannot be made a basis for either a generalized statistical mechan- 

C"j ics or a generalized information theory. Either statistical independence 

must be waived, or the expression of the averaged conditional probabil- 
'-^J , ity as the difference between the marginal and joint entropies must be 

£^ ■ relinquished. The same inequality, relating the PAE to the Renyi en- 

O ' tropy, when applied to the mean code length produces an expression that 

it is without bound as the order of the code length approaches infinity. 
Since the mean code length associated with the Renyi entropy is finite 
and can be made to come as close to the Hartley entropy as desired in the 
same limit, the PAE have a more limited range of validity than the Renyi 
entropy which they approximate. 
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O '■ 1 Problems with PAE 

The entropy measure of degree-a ^El an d the a-norm entropy measure [312] are 
PAE insofar as they satisfy a pseudo-additive relation, rather than the additive 
relation satisfied by the Shannon [5] and Renyi [S] entropies. Additivity has 
been confused with thermodynamic extensivity, and a whole new branch of 
Q 'nonextensive thermodynamics' has emerged based on the entropy measure of 

O ■ degree-a |7] . The basis of which is the pseudo-additive relation that expresses 

the joint entropy in terms of the sums and products of the marginal entropies. 
The lack of additivity — even for statistically independent events — has given rise 
to speculative claims regarding correlations induced by the lack of additivity 



■ (extensivity 



We shall show that PAE measures of uncertainty are upper or lower bounds 
on the Renyi entropy, depending upon the value of the parameter a, and are 
incompatible with the rates of transmission of information and channel capacity, 
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expressed in terms of the difference between marginal and averaged conditional 
entropies, or, equivalcntly, as the difference between the sum of marginal en- 
tropies and the joint entropy. The reasons are: 

f . The averaged conditional entropies are not bounded from above by marginal 
entropies, which applies in the case of statistically independent events. 

2. The conventional rates of transmission of information and channel capac- 
itance can be negative for values of the characteristic exponent, a < 1, do 
not vanish for statistically independent input and output, and not even 
when the probabilities are equal for correct and incorrect transmission so 
that the channel does not convey any information at all. 

3. The same approximation that relates the PAE to the Renyi entropy when 
applied to the mean code length associated with the Renyi entropy leads 
to an unbounded mean code in the a = limit, whereas the former can 
be made to come as close to the Hartley entropy as desired by the use of 
extensions of a code. 

The PAE will be shown to be related to the Renyi entropy by the inequality 
In a; < x — 1, and have no claim in themselves to be considered as genuine 
expressions of uncertainty for the reasons listed above. Moreover, any viable 
candidate for an entropy cannot depend upon a parameter, which is different 
for different systems, since such systems would be incommensurable. 

2 PAE in information theory 

The conclusions drawn this section follow from similar results found earlier by 
Landsberg and Vedral [5], who attempted to insert entropies of degree-a into 
an information theory framework. Specifically, they found that the channel ca- 
pacity does not vanish when the input and output are statistically independent, 
or when the probabilities of correct and incorrect transmission are equal, and it 
can even become negative, which they thought "would be extremely surprising 
and path-breaking to find a physical system which conveys information in spite 
of going through a completely destructive channel." We shall appreciate that 
these conclusions are based on the conventional forms of transmission rate and 
channel capacity that are used for additive entropies. Their expression for the 
averaged conditional entropy coincides with what Aczel and Daroczy ^H] have 
termed the property of 'strong additivity of degree-a' [vid. © below], but, it 
does not transform into the marginal entropy when the events are statistically 
independent. A vestige of what was the conditioning variable will be shown to 
be responsible for all the wrong results. 

Consider a discrete set of random variables X and Y with probability dis- 
tributions P = (pi, . . . ,p m ) and Q = (q±, . . . , q n ) over sets (xi, . . . , x m ), and 
(j/i, . . . , y n ), respectively. Similarly, the two-dimensional random pair (X,Y) 
has the joint probability distribution II = (ttu, ■ . ■ , 7r mn ), where itij = 
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Pr (X = Xi,Y = yj). The conditional probabilities pij and q 3 i , are denned by 

T^ij PijQj QjiPi- 

Papers |11II12| have been written proposing a set of properties that uniquely 
characterize the so-called Tsallis entropy 

Si{P) = — ^ l =l Pl , 

a — 1 

in analogy with those that characterize the Shannon entropy. In addition to 
the characterization of the entropy of degree-a found in the original papers 
a complete and unique characterization of this PAE was given in the 1975 
monograph of Aczel and Daroczy |1(J) . 

At the top of this list is the property of pseudo-additivity of the joint entropy 

S?(x,y) = S i (x)+A i (x)S i (y) (1) 
= K{y)Si(x) + Si(y) = Si(x) + S t {y) + T i S l {x)S i (y), 

for statistically independent events, x, and y, where Xi(x) = pf, X 2 (y) = 

(Xw=i if) i T i — 1 ~ a i an d T 2 = (1 — a) /a. As it stands, the pseudo- 
additivity of these entropies, jl). appears as an ad hoc weighting of one of the 
marginal entropies by a factor A,; . It is only when we consider the averaging of 
the conditional entropy that these weights make some physical sense. 

The averaged conditional entropy is required to be the difference between 
the joint and marginal entropies: 

E [Si(y\X)} = Si{y\x) = Sfa y) - S^x), (2) 

where E denotes the mathematical expectation. The equality asserts that the 
uncertainty of y, when x is known, is equal to the uncertainty (entropy) of the 
joint event, x, y, less the uncertainty of x 5 . 

This demands that the averaging be performed by what has become to be 
known as averaging with respect to (unnormalized) 'escort' probabilities |13|. 
As a result, will coincide with the property of strong additivity of degree-a 
[TU] . The strong additivity of the PAE are 



Si{x\y) = ]T <&Sx(x\ yj ) = Sifay) - S x (y), (3) 

3=1 




for the entropy measure of degree-a, and 

l/a 

^(■'- .'/)=!>: '// 1 S 2 (x\ yj ) = S 2 (x,y) - S 2 (y) (4) 



for the a-norm entropy measure. 

Any putative expressions for the averaged conditional and joint entropies 
must satisfy the conditions [S]: 

Si{y\x) < Si(y) (5) 
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and 



Si(x,y) < Si(x) + Si(y). (6) 

Expressed in words, JSJ says that the uncertainty of y is never increased by a 
knowledge of a:, and JHJl affirms that the uncertainty of a joint event can never 
be greater than the sum of the individual uncertainties. 

As a consequence of pseudo-additivity QJ , conditions (JSJ and © are replaced 

by: 

Si(y\x) < \i(x)Si(y), (7) 

and 

Si(x,y) < Si(x) + Si(y) < S°(x,y) - nS^S^y), (8) 

respectively, since the joint entropy is greatest when the events are statistically 
independent. According to Q, it is not true for a < 1 (A, > 1) that the 
averaged conditional entropy is inferior, or, at most equal, to the marginal 
entropy. When x does not condition y, the right-hand side of (Q should be 
independent of x, which it is not. A vestige of the conditioning variable, even 
when it is no longer conditioning, demands a physical explanation. The joint 
entropy must be greatest in the case of statistically independent events, and 
(JSJl shows that for a > 1 this may not necessarily be so. Since Landsberg and 
Vedral based their analysis on the validity of the expression for the averaged 
conditional entropy, ©, it is no wonder that they came to absurd conclusions 
regarding the transmission rate and channel capacity when the entropy measure 
of degree-a is used instead of additive entropies. 

The expression for the averaged conditional entropy in J3J was rejected by 
Boekee and Lubbe 0] precisely because it does not satisfy the equality in JSJ) in 
the case of statistically independent events. A classical extension of Minkowski's 
inequality [T4] . 




for a > 1, was used to show that ordinary averaging, using weights, pi, of the 
conditional a- norm measure of uncertainty satisfies condition J5J. The price to 
be paid is that no longer holds. 

The expressions for the rate of transmission of information and channel ca- 
pacity that should have been used are: 

r-> Q ( s Sj(x\y) 



and 

My) 



C^max^^)-^^ (ID) 
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respectively, where the maximum is with respect to all possible information 
sources used as input to the channel. The division of the averaged conditional 
entropies by the weighting factors Ai(y) together with the property of strong 
additivity, J3J and J3J, insure that when the inputs and outputs are statisti- 
cally independent, or when there is equal probability for correct or incorrect 
transmission, the transmission rate and the channel capacity vanish. 

However, this does not come without a cost: What is compromised is the 
relation between the marginal and joint probabilities in that the transmission 
rate is proportional to the physically uninterpretable difference Xi(y)Si(x) + 
Si(y) — Si(x, y), rather than the difference between the sum of marginal entropies 
and the joint entropy, Si(x) + Si(y) — Si(x, y), as it would be were the entropies 
additive p. 38]. Although the transmission rate, JSJ, and hence the channel 
capacity, iflfjl) . appear to be increased (decreased) beyond their conventional 
values for a < 1 (a > 1), they can never become negative in the range a < 1, as 
they would be — above all for statistically independent input and output — were 
the conventional forms of the R and C for additive entropies to be employed 

The conclusion that we are forced into accepting is that the entropy measure 
of degree-a, as well as the a-norm entropy measure, cannot be used as valid 
expressions for a true entropy. We will now show that they are related to an 
interpolation formula which varies between the Shannon entropy when a = 1 to 
the Hartley entropy when a = 0. It is quite remarkable that the latter does not 
require equal probabilities, as the PAE would — in the a = 1 limit rather than 
the a = limit. 

The interpolation formula is the Renyi measure of uncertainty: 



1~2 



l/a 



which we have written in a form that displays the intimate relation between 
the entropy measure of degree-a and the a-norm measure. On the strength of 
inequality In x < x — 1, it is readily seen that the Renyi entropy can never be 
greater than the PAE for a < 1, and never smaller than the PAE for a > 1 [3]. 
The averaged conditional Renyi entropy, 




S R (y\x) = S R (x, y) - S R (x) = - In > > q% a , (11) 



looks like a normalized escort average of the conditional probabilities to the 
power a. Applying the inequality In a; < x — 1 to the right-hand side of Hilt we 
get 



Sr{v\x) < 

Lj=i 2^=i - lLi=i Pi ) _ S 2 {x,y) - S 2 {x) 

(J2T=iP?) 1/a ~ X2{x) 
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for a < 1, and the reverse inequality for a > 1. Alternatively, this can be written 
in terms of the difference of the joint and marginal entropies of degree-a < 1, 



Sn{y\x) < 

1 YJj=i YhLi n ij ~ YT=i Pi _ Si(x,y) - gi(aQ 
YZiPt M*) 



(12) 



Abe J2j called the ratio on the right-hand side of Ijl2(l the averaged condi- 
tional entropy of degree-a, instead of the numerator, and obtained it using a 
normalized escort averaging of the conditional entropy of degree-a. Notwith- 
standing the fact that this averaged conditional entropy is no longer the dif- 
ference between the joint and marginal entropies, conditions JSJl and |JB} are 
satisfied, and amount to considering expressions © and I|1U|I for the rate of 
transmission of information and the channel capacity. The normalizing factor 
eliminates the vestigial weighting factor in (JJJ , that depends what was formerly 
the conditioning variable. Consequently, the channel capacity, as defined by 
H1Q(I . will always be positive semidefinite, vanishing either when the input and 
output are statistically independent, or when the probabilities for correct and 
incorrect channel transmission are equal. 

To illustrate the latter, it suffices to consider a binary symmetric channel. 
Let P be the probability of incorrect transmission, i.e., qu = 522 = 1 — P and 
Q12 = Q21 = P- If p (<?) is the probability of the input (output) x = (y = 0) 
and 1—p (1 — q) that of x = 1 (y = 1), then the transmission rate, ©, is 

_ p a + (1 - p) a q a + (1 - q) a - (1 - P) a - P a 

~ q a + (1 - q) a 1 - a ' 

and the channel capacity, 

2 l-a _ (1 _ P)« - p<* 



c = 
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vanishes for P = |, because the channel does not convey any information. 

Therefore, at the expense of destroying the relation between the marginal 
and joint entropies, or that between the averaged conditional entropy and the 
difference between the joint and marginal entropies, the 'unphysical' results 
found in |U] are avoided. But this makes it only more apparent that the PAE are 
not fundamental quantities upon which a theory of information, or a generalized 
statistical mechanics, can be constructed. 



3 Characterization of PAE through a coding prob- 
lem 

In order to show the limitations of the entropy measure of degree-a and the 
a-norm measure, consider possible measures of the length of a code. We have 
an alphabet of D symbols, d\, . . . , dp into which the input symbols are to be 



G 



encoded. Let pi, ■ ■ ■ ,p m be the probabilities of m input symbols, X\, . . . , x rn 
from an information source. To each x we wish to associate a sequence of the 
cTs, with the only restriction that no sequence of the d's shall be obtainable 
from a shorter sequence by the addition of more terms to the shorter |T2]- The 
length £i of a sequence that is to be associated, in some way, with xi will have 
an average length Yli=i Pi^i- 

According to Kraft's theorem, there will be a uniquely decipherable code if 
and only if |15| 

m 

E D ~ h ^ 1 ( 13 ) 

i=l 

is satisfied. The condition for equality is £i = — Ino Pi, since the pi form a 
complete set. But, this is not the only optimal code length. 

There may be many different codes whose lengths satisfy the Kraft inequal- 
ity, (| 1 31) - In order to determine the optimum code it is necessary to consider 
the mean code length, and to minimize it. Such a procedure is valid when the 
'cost' of using a sequence of length £i is proportional to the mean length of 
a code, X^I=i Pi^i- However, a more general cost would show an exponential 
dependence on the lengths, £i [To] 

m 

C(T 2 ) = Y,P l D T2i \ (14) 

and span a range of optimal code lengths as a function of the parameter 

If T2 is restricted to the range 0<T2<oo(0<a<l), the length of a code 

m 

L(T 2 ) = -\n D C{T 2 ), (15) 

T 2 

can vary from 

m 

L(0) = E Pi l i (!6) 

i=l 

as t 2 — » (a — » 1) to 

L(oo) = lim L(t 2 ) = max £j (17) 

T2— >oo l<i<?n 

as T2 — > oo (a — » 0), since for large T2, 

m 

Y,^D T ^=p k D^ l \ (18) 

i=l 

where £fc is the largest of the numbers £±, . . . , £ m , and p/c is its probability. 
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In the former case, ii > — hxoPi, and the Shannon entropy becomes the 
lower limit to the mean code length jSJ p. 30] 

m m 
i=l i=l 

In the latter case, Holder's inequality can be used to obtain ]TJ^ 

m / m \ V« 

E PiD T2ii > Dp? . (is) 

i=l \i=l / 

so that 

/ m \ V<* 

i(r 2 )>-ln D (E^J ' ( 2 °) 

for < T2 < oo. And over this range, the Renyi measure of uncertainty becomes 
the lower limit for the mean code length. 

Moreover, the optimal code lengths can be expressed in terms of the nor- 
malized escort probabilities as 



Z^i=l Pi 



which again gives the equality in Since a lies in the interval (0, 1), ii > 

—a hip pi, and no conclusion may be reached in comparison with the lengths 
li = — hio Pi, when the mean length is given by the weighted average. 

Applying inequality In x < x — 1 to the definition of the mean length 
gives a new mean code length which is directly proportional to the cost, viz., 



T 2 T 2 



where the inequality follows from (|19|l . Thus, the a-norm entropy measure is 
the lower limit to the mean code length of order r 2 , which is proportional to the 
cost (|14H , instead of its logarithm as in (|15|l . 

The reason for calling l|21|) a length is that if all the code lengths were equal 

to e, 

D T2t - 1 
L r 2 = . 

T 2 

and in the limit as r 2 — > 0, it would reduce simply to L(0) = £. More generally, 
for unequal code lengths, l|21|) becomes the weighted average (|16|l in the same 
limit, and the a- norm entropy measure becomes the Shannon entropy. 
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However, in the limit as t 2 — ► 00, instead of l|17|). whose finiteness is due the 
presence of the logarithm in the numerator, we now get 

L(r 2 ) = , (22) 

T2 

on account of (JTSJ) . There is no finite upper limit to l|22|l . 

The lack of an upper bound on the mean code length in the r 2 = 00 limit 
is in contradiction with the well-known bounds on the average code length for 
M-extensions of a code, which is all possible concatenations of the m symbols of 
the original source code. In the limit a — > 0, the Renyi measure of uncertainty 
becomes the Hartley entropy, and, for the Mth extension, the mean code length 
is bounded by [Tr)] : 

The total entropy is M times as large as the original Hartley entropy, So — 
In d 771, and the averaged code length Lm (00) = maxf(s), where s is an input 
sequence of length M. By choosing M sufficiently large, the average code length 
per extension, Lm(oo)/M, can be made to come as close to the Hartley entropy 
as desired in the t 2 = 00 limit. 

Consequently, the approximation of Renyi measure of uncertainty by the 
PAE is not valid in the t 2 = 00 (a = 0) limit. Whereas the Renyi entropy 
transforms into the Hartley entropy, and provides upper and lower bounds on 
the mean code length per extension, as given by l|23|) . the PAE do not provide 
similar bounds on the code length (1221) because the latter is unbounded in the 
t 2 = 00 limit. As a matter of fact, the PAE reduce to the Hartley entropy in 
the a = 1 limit, but, only after the probabilities have been set equal. Therefore 
the PAE have a more limited range of validity than the Renyi entropy which 
they approximate. This together with their problems of handling statistical 
independent events and correlations among statistical dependent ones do not 
make them suitable as a basis for either a generalized information theory, or a 
generalized statistical mechanics. 
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